Search results for "Data format"

showing 10 items of 10 documents

A comparison of HDFS compact data formats: Avro versus Parquet

2017

In this paper, file formats like Avro and Parquet are compared with text formats to evaluate the performance of the data queries. Different data query patterns have been evaluated. Cloudera’s open-source Apache Hadoop distribution CDH 5.4 has been chosen for the experiments presented in this article. The results show that compact data formats (Avro and Parquet) take up less storage space when compared with plain text data formats because of binary data format and compression advantage. Furthermore, data queries from the column based data format Parquet are faster when compared with text data formats and Avro. Article in English. HDFS glaustųjų duomenų formatų palyginimas: Avro prieš Parquet…

Big DataComputer scienceBig dataEnergy Engineering and Power Technology02 engineering and technologyManagement Science and Operations Researchcomputer.software_genreColumn (database)020204 information systemsData query0202 electrical engineering electronic engineering information engineeringHDFSDatabasebusiness.industryPlain textMechanical Engineeringcomputer.file_formatAvroFile formatHiveParquetData formatHadoopBinary data020201 artificial intelligence & image processingbusinesscomputerMokslas – Lietuvos ateitis / Science – Future of Lithuania

researchProduct

The problem of interoperability: A common data format for quantum chemistry codes

2007

A common format for quantum chemistry (QC), enhancing code interoperability and communication between different programs, has been designed and implemented. An XML-based format, QC-ML, is presented for representing quantities such as geometry, basis set, and so on, while an HDF5-based format is presented for the storage of large binary data files. Some preliminary applications that use the format have been implemented and are also described. This activity was carried out within the COST in Chemistry D23 project “MetaChem,” in the Working Group “A meta-laboratory for code integration in ab initio methods.” © 2007 Wiley Periodicals, Inc. Int J Quantum Chem, 2007

Computer sciencecomputer.internet_protocolInteroperabilityEfficient XML InterchangeHierarchical Data Format010402 general chemistrycomputer.software_genre01 natural sciencesinterfaces0103 physical sciencesCode interoperabilityCode (cryptography)Physical and Theoretical ChemistryCommon Data FormatComputingMilieux_MISCELLANEOUS010304 chemical physicsProgramming languagecomputer.file_formatCondensed Matter Physicscomputational chemistryAtomic and Molecular Physics and Optics0104 chemical sciencesXML frameworkBinary dataCode interoperability; interfaces; computational chemistry; fortrancomputerXMLfortran

researchProduct

Random Slicing: Efficient and Scalable Data Placement for Large-Scale Storage Systems

2014

The ever-growing amount of data requires highly scalable storage solutions. The most flexible approach is to use storage pools that can be expanded and scaled down by adding or removing storage devices. To make this approach usable, it is necessary to provide a solution to locate data items in such a dynamic environment. This article presents and evaluates the Random Slicing strategy, which incorporates lessons learned from table-based, rule-based, and pseudo-randomized hashing strategies and is able to provide a simple and efficient strategy that scales up to handle exascale data. Random Slicing keeps a small table with information about previous storage system insert and remove operations…

DesignComputer scienceDistributed computingPerformancestorage managementHash function0102 computer and information sciences02 engineering and technologyParallel computingUSable01 natural sciencesSlicingrandomized data distributionAffordable and Clean Energy0202 electrical engineering electronic engineering information engineeringRandomnessExperimentationscalabilityPseudorandom number generatorbusiness.industry020206 networking & telecommunicationsReliabilityData FormatPRNG010201 computation theory & mathematicsHardware and ArchitectureComputer data storageScalabilityTable (database)businessNetworking & Telecommunications

researchProduct

Energy and environmental benefits in public buildings as a result of retrofit actions

2011

Abstract The paper presents the results of an energy and environmental assessment of a set of retrofit actions implemented in the framework of the EU Project “BRITA in PuBs” (Bringing Retrofit Innovation to Application in Public Buildings – no: TREN/04/FP6EN/S07.31038/503135). Outcomes arise from a life cycle approach focused on the following issues: (i) construction materials and components used during retrofits; (ii) main components of conventional and renewable energy systems; (iii) impacts related to the building construction, for the different elements and the whole building. The results are presented according to the data format of the Environmental Product Declaration. Synthetic indi…

EngineeringArchitectural engineeringSettore ING-IND/11 - Fisica Tecnica AmbientaleRenewable Energy Sustainability and the Environmentbusiness.industryCivil engineeringSet (abstract data type)BRITALife cycle approachBuilding retrofitEnergy Environmental assessmentData formatRenewable energy systemReturn ratioEnvironmental impact assessmentbusinessEnergy (signal processing)Environmental product declarationBuilding constructionRenewable and Sustainable Energy Reviews

researchProduct

Workflow-Based Decision Support for Failure Mode and Effects Analysis

2010

Abstract To achieve high quality designs, processes, and services that meet or exceed industry standards, it is crucial to identify all potential failures throughout a system and work to minimize or prevent their occurrence or effects. This paper presents an innovative approach to Failure Mode and Effects Analysis (FMEA) that uses a Decision Support System (DSS) for supporting the FMEA processes. The DSS is powered by a workflow engine that guides the users through the processes by considering standard work templates or previous similar cases. It is also built as a framework for decision support tools so, beside its default one, different FMEA work instruments can be plugged-in and used thr…

EngineeringDecision support systembusiness.industryInterface (Java)media_common.quotation_subjectWorkflow engineWorkflowWork (electrical)Systems engineeringQuality (business)businessSoftware engineeringCommon Data FormatFailure mode and effects analysismedia_commonIFAC Proceedings Volumes

researchProduct

Large expert-curated database for benchmarking document similarity detection in biomedical literature search

2019

P.B. participated in the design, carried out the study, implemented the websites and search systems and wrote the manuscript. RELISH consortium annotated the articles. Y.Z. conceived the study, participated in the initial design, assisted in analyzing data and wrote the manuscript. All authors read, contributed to the discussion and approved the manuscript. Rayner Gonzálezlez-Prendes is member of the RELISH consortium. Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly deve…

Evaluación comparativaMedicina i literaturaData FormatRSNational Institutes of Health (U.S.) -- PubMed CentralQH301Library and Information StudiesCerca documentalInvestigaciónGeneric health relevanceCiencias médicasRELISH ConsortiumCancerCiència

researchProduct

OpenTIMS, TimsPy, and TimsR: Open and Easy Access to timsTOF Raw Data

2021

The Bruker timsTOF Pro is an instrument that couples trapped ion mobility spectrometry (TIMS) to high-resolution time-of-flight (TOF) mass spectrometry (MS). For proteomics, lipidomics, and metabolomics applications, the instrument is typically interfaced with a liquid chromatography (LC) system. The resulting LC-TIMS-MS data sets are, in general, several gigabytes in size and are stored in the proprietary Bruker Tims data format (TDF). The raw data can be accessed using proprietary binaries in C, C++, and Python on Windows and Linux operating systems. Here we introduce a suite of computer programs for data accession, including OpenTIMS, TimsR, and TimsPy. OpenTIMS is a C++ library capable …

Proteomics0301 basic medicineSwift030102 biochemistry & molecular biologyComputer scienceReading (computer)SuiteGeneral Chemistrycomputer.file_formatPython (programming language)Hierarchical Data Formatcomputer.software_genreBiochemistryMass Spectrometry03 medical and health sciences030104 developmental biologyData accessIon Mobility SpectrometryOperating systemRaw datacomputerSoftwareChromatography Liquidcomputer.programming_languageCodebaseJournal of Proteome Research

researchProduct

Lone Star Stack: Architecture of a Disk-Based Archival System

2014

The need for huge storage systems rises with the ever growing creation of data. With growing capacities and shrinking prices, "write once read sometimes" workloads become more common. New data is constantly added, rarely updated or deleted, and every stored byte might be read at any time - a common pattern for digital archives or big data scenarios. We present the Lone Star Stack, a disk based archival storage system building block that is optimized for high reliability and energy efficiency. It provides a POSIX file system interface that uses flash based storage for write-offloading and metadata and the disk-based Lone Star RAID for user data storage. The RAID attempts to spin down disks a…

Standard RAID levelsHardware_MEMORYSTRUCTURESComputer scienceRAIDDisk array controllerbusiness.industryDisk mirroringDisk controllerDisk buffercomputer.software_genreDisk Data Formatlaw.inventionData recoverylawData_FILESOperating systembusinesscomputer2014 9th IEEE International Conference on Networking, Architecture, and Storage

researchProduct

LoneStar RAID

2016

The need for huge storage archives rises with the ever growing creation of data. With today’s big data and data analytics applications, some of these huge archives become active in the sense that all stored data can be accessed at any time. Running and evolving these archives is a constant tradeoff between performance, capacity, and price. We present the LoneStar RAID, a disk-based storage architecture, which focuses on high reliability, low energy consumption, and cheap reads. It is designed for MAID systems with up to hundreds of disk drives per server and is optimized for “write once, read sometimes” workloads. We use dedicated data and parity disks, and export the data disks as individu…

Standard RAID levelsHardware_MEMORYSTRUCTURESRAIDDisk array controllerComputer sciencebusiness.industryDisk mirroring020206 networking & telecommunicationsFault tolerance02 engineering and technologycomputer.software_genreDisk Data Formatlaw.inventionHardware and Architecturelaw020204 information systemsEmbedded systemData_FILES0202 electrical engineering electronic engineering information engineeringOperating systemCacheNon-standard RAID levelsbusinesscomputerACM Transactions on Storage

researchProduct

Code Interoperability and Standard Data Formats in Quantum Chemistry and Quantum Dynamics: The Q5/Q5cost Data Model

2014

Code interoperability and the search for domain-specific standard data formats represent critical issues in many areas of computational science. The advent of novel computing infrastructures such as computational grids and clouds make these issues even more urgent. The design and implementation of a common data format for quantum chemistry (QC) and quantum dynamics (QD) computer programs is discussed with reference to the research performed in the course of two Collaboration in Science and Technology Actions. The specific data models adopted, Q5Cost and D5Cost, are shown to work for a number of interoperating codes, regardless of the type and amount of information (small or large datasets) …

Theoretical computer scienceGrid ComputingComputer scienceDistributed computingInteroperability010402 general chemistrycomputer.software_genre01 natural sciencesData typegrid computingData modelingquantum chemistryquantum dynamicQuantum DynamicsCode interoperability0103 physical sciencesprogram interoperabilityCommon Data FormatComputingMilieux_MISCELLANEOUSdata format010304 chemical physicsChemistry (all)General ChemistryQuantum ChemistryGridData Format0104 chemical sciences[CHIM.THEO]Chemical Sciences/Theoretical and/or physical chemistryComputational MathematicsGrid computingData modelProof of conceptcomputerCode interoperability; Quantum Chemistry; Quantum Dynamics; Data Format; Grid ComputingJ. Comput. Chem.

researchProduct